|
Poster:
|
Tom Gally |
Date:
|
December 11, 2009 06:20:16pm |
|
Forum:
|
texts
|
Subject:
|
Missing images from Google book scans |
Nearly every day, I check the following two URLs to see what books have been added to the Americana text collection, and nearly every day I come across unusual, interesting, and wonderful books.
http://www.archive.org/search.php?query=%28collection%3Aamericana%20AND%20format%3Apdf%20AND%20mediatype%3Atexts%29%20AND%20-mediatype%3Acollection&sort=-publicdate
http://www.archive.org/search.php?query=-description%3A%28Google%29%20AND%20%28collection%3Aamericana%20AND%20format%3Apdf%20AND%20mediatype%3Atexts%29%20AND%20-mediatype%3Acollection&sort=-publicdate
The first URL lists all of the most recent additions to the Americana collection, while the second URL excludes those taken from Google (nearly all added, it seems, by user tpb, whom I presume is a 'bot). Even though it returns many fewer books, I prefer the latter URL because, too often, the scans from Google are bad in multiple ways--folded-over pages, visible fingers, resolution too coarse for the text to be read, and, worst of all, omitted images.
Examples of missing images, all from books taken from the former URL a few minutes ago, can be seen here:
http://www.archive.org/stream/economicmininga01lockgoog#page/n410/mode/2uphttp://www.archive.org/stream/edinburghphilos10edingoog#page/n381/mode/2uphttp://www.archive.org/stream/earthadescripti01reclgoog#page/n34/mode/2upPresumably Google removed these images in order to improve the accuracy of OCR conversion, but it's a shame that these files are being added in such large numbers to the Internet Archive when, one would hope, better scans must be available somewhere.
Does anyone know why these defective versions, rather than versions with the illustrations intact, are being added? Do the libraries at which the books were scanned (Harvard University, University of California, etc.) know that defective versions of their books are being added to the Internet Archive? Can anything be done to replace those scans with better ones?
|
Poster:
|
stringybark |
Date:
|
December 11, 2009 07:15:40pm |
|
Forum:
|
texts
|
Subject:
|
Re: Missing images from Google book scans |
In my experience, the Google digitisation is done with quantity rather than quality in mind. Eventually I concluded that they have an 'acceptable level of defect' policy where, providing only around 1 percent of pages are spoiled, the quality standards are met. So all "user tpb" books have a few dud pages. The contrast in quality is stark compared to the older, far better quality, MSN sponsored digitisation.
|
Poster:
|
Time Traveller |
Date:
|
December 11, 2009 10:13:36pm |
|
Forum:
|
texts
|
Subject:
|
Re: Missing images from Google book scans |
seeing they might be losing the rights to lots of books soon, it might be that the more they have on line today, the more rights they have to keep these books.
But they are sort of profit making, while volunteers do the scanning for the IA, a labour of love as opposed to a salary from Google.
Peter
|
Poster:
|
stbalbach |
Date:
|
December 11, 2009 10:09:17pm |
|
Forum:
|
texts
|
Subject:
|
Re: Missing images from Google book scans |
Concur with stringybark. Also there's been a problem with public-domain books disappearing from Google's full-view access. Presumably Google wants to sell those books in the future once it resolves the legalities, although there may be other reasons. User tpb is doing a great service by getting books off Google ASAP while it's still possible, before they disappear behind limited-preview mode (ie. pay wall), even if they are poor quality, it's better than nothing!
Stephen
|
Poster:
|
Time Traveller |
Date:
|
December 11, 2009 11:10:10pm |
|
Forum:
|
texts
|
Subject:
|
Re: Missing images from Google book scans |
The arrangement that Google has, where it owns out of copyright books or something until authors opt out is causing lots of discussion and legal action.
It may be that with it all in the news media, authors are opting out, becuase they did not know before, how Google was using their book (s)
Peter